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This work deals with the ill-posed inverse problem of reconstructing a function / given 
implicitly as the solution of g = Af, where A is a compact linear operator with unknown 
singular values and known eigenfunctions. We observe the function g and the singular 
values of the operator subject to Gaussian white noise with respective noise levels e and a. 
We develop a minimax theory in terms of both noise levels and propose an orthogonal 
series estimator attaining the minimax rates. This estimator requires the optimal choice 
of a dimension parameter depending on certain characteristics of / and A. This work 
addresses the fully data-driven choice of the dimension parameter combining model se- 
lection with Lepski's method. We show that the fully data-driven estimator preserves 
minimax optimality over a wide range of classes for / and A and noise levels e and a. The 
results are illustrated considering Sobolev spaces and mildly and severely ill-posed inverse 
problems. 
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1. Introduction 

Let (H, (■, -) H ) and (G, (■, -) G ) be separable Hilbert spaces and A a compact linear operator from H 
to G with unknown singular values. This work deals with the reconstruction of a function / G H given 
noisy observations of the image g = Af on the one hand and of the unknown sequence of singular 
values a = (aj)j^ on the other hand. In other words, we consider a statistical inverse problem with 
partially unknown operator. There is a vast literature on statistical inverse problems. For the case 
where the operator is fully known, the reader may refer to Johnstone and Silverman (1990), Mair 
and Ruymgaart (1996), Mathe and Pereverzev (2001), and Cavalier et al. (2002) and the references 
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therein. A typical illustration of such a situation is a deconvolution problem (cf. Ermakov (1990), 
Stefanski and Carroll (1990), and Fan (1991) among many others). For a more detailed discussion 
and motivation of the case of a partially unknown operator which we consider in this work, we refer 
the reader to Cavalier and Hengartner (2005). Efromovich (1997) and Neumann (1997) consider such 
a setting in the particular context of a density deconvolution problem. 

Let us describe in more detail the model we are going to consider. We suppose that A admits 
a singular value decomposition (aj,(fj,t/jj)j^ as follows. Denote by A* the adjoint operator of A. 
Then, A* A is a compact operator on H with eigenvalues (a|)j e pj whose associated orthonormal basis of 
eigenfunctions {<fj} we suppose to be known. Analogously, the operator AA* has eigenvalues (a^)jeN 
and known orthonormal eigenfunctions ipj = \\A<pj\\Q L A<pj in G. Projecting the inverse problem 
g = Af on the eigenfunctions, we obtain the system of equations [g]j := {g,ipj)c = aj{f,ipj)n for 
j 6 N. As the operator A is compact, the sequence of singular values tends to zero and the inverse 
problem is called ill-posed. 

The solution / is characterized by its coefficients [f]j := {f,{pj)i{. Our objective is their estimation 
based on the following observations: 

Y j = [g}j + Ve£j = a j [f} j + VE£j and X j = a j + ^r )j (j G N), (1.1) 

where the £j,r]j are iid. standard normally distributed random variables and e, a € (0,1) are noise 
levels. Thus we represent the problem at hand as a hierarchical Gaussian sequence space model. 
Of course / can only be reconstructed from such observations if all the aj are non-zero which is the 
case if and only if the operator A is injective. We assume this from now on, which allows us to write 
/ = ^2°jLi[9]j a J 1 Vj- Hence, an orthogonal series estimator of / is a natural approach: 

The threshold using the indicator function accounts for the uncertainty caused by estimating the aj 
by Xj. It corresponds to Aj's noise level as an estimator of aj, which is a natural choice (cf. Neumann, 
1997, p.310f.). Note that fk depends on a dimension parameter k whose choice essentially determines 
the estimation accuracy. Its optimal choice generally depends on both unknown sequences ([f]j) 
and (aj). Our purpose is to establish an adaptive estimation procedure for the function / which does 
not depend on these sequences. More precisely, assuming that the solution and the operator belong 
to given classes / 6 T and A £ A, respectively, we shall measure the accuracy of an estimator / of / 
by the maximal weighted risk IZ^^f, T , A) := supj g jrSup^ g _4lE||/ — defined with respect to some 
weighted norm ||-|| w := X}jeN u i\ Hil > where uj := (ujj)j^fq is a strictly positive weight sequences. This 
allows us to quantify the estimation accuracy in terms of the mean integrated square error (MISE) 
not only of / itself, but as well of its derivatives, for example. Given observations Y = (Yj)j^ and 
X = (Xj)j£fq with respective noise levels e and a according to (1.1), the minimax risk with respect 
to the classes T and A is then defined as 7£* (e, a, J 7 , „4) := inf jTZ UJ (f, J 7 , A), where the infimum is 

taken over all possible estimators / of /. An estimator / is said to attain the minimax rate or to be 
minimax optimal with respect to T and A if there is a constant C > depending on the classes only 
such that TZ w (f , J-,A)^C 7^,(e, cr, J 7 , A) for all e, a G (0, 1). An estimation procedure which is fully 
data-driven and minimax optimal for a wide range of classes J- and A is called adaptive. 

In the next section, we show that for a wide range of classes J- and A the orthogonal series estima- 
tor fk* attains the minimax rate for an optimal choice k* of the dimension parameter. We illustrate 
this result considering subsets of Sobolev spaces for T and distinguishing two types of operator 
classes A specifying the decay of the singular values: If (aj) decays polynomially, the inverse problem 
is called mildly ill-posed and severely ill-posed if they decay exponentially. However, k* is chosen 
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subject to a classical variance-squared-bias trade-off and depends on properties of both classes J- 
and A which are unknown in general. 

The last section is devoted to the development of a data-driven choice k of k, following the general 
model selection scheme (Barron et al., 1999, cf.). This methodology requires the careful choice of a 
contrast function and a penalty term. In this work, we will use a contrast function inspired by the 
work of Goldenshluger and Lepski (2011) who consider bandwidth selection for kernel estimators. 
Given a random sequence (pen fc )fc^i of penalties, a random set {1, . . . , K £ ^ a } of admissible dimension 
parameters and the random sequence of contrasts 

$ fe := max \\\fj - f k \\l - pen) (fc € N). (1.2) 

The dimension parameter is selected as the minimizer 1 of a penalized contrast 

k := argmin < \P& + pen^ >. (1.3) 

We assess the accuracy of the fully data-driven estimator fa deriving an upper bound for lZ u (f-z, J-, A). 
Obviously this upper bound heavily depends the random sequence (pen fc ) and the random upper 
bound K. However, we construct these objects in such a way that the resulting fully data-driven 
estimator /-£ is minimax optimal over a wide range of classes and thus adaptive. The more technical 
proofs and some auxiliary results are deferred to the appendix. 

Hoffmann and Reiss (2008) also study adaptive estimation in linear inverse problems, but their method 
is limited to mildly ill-posed inverse problems with known degree of ill-posedness. Also, the theoretical 
framework is quite different: they focus on sparse representations and therefore consider estimators 
based on wavelet thresholding and show their rate-optimality and adaptivity properties over Besov 
spaces with respect to the corresponding norms. 

Adaptive estimation in a hierarchical Gaussian sequence space model has previously been considered 
by Cavalier and Hengartner (2005). Though, the authors restrict their investigation to the mildly 
ill-posed case and to noise levels satisfying a ^ e. The new approach presented in this paper has the 
advantage of not requiring such restrictions. On the contrary, the influence of the two noise levels 
on the estimation accuracy is characterized. Moreover, the estimator presented in this paper can 
attain optimal convergence rates independently of whether the underlying inverse problem is mildly 
or severely ill-posed, for example, even when e <C a. This is an important feature in applications 
where the reduction of the noise level a can be costly. In (satellite or medical) imaging, for example, 
the observation of the sequence X may correspond to calibration measurements. In order to achieve 
an adequately high precision of these measures as to reduce the noise level a sufficiently, one might 
have to repeat expensive experiments. It is thus desirable to know how the estimator performs when a 
exceeds e. 

2. Minimax 

In this section we develop a minimax theory for Gaussian inverse regression with respect to the classes 
^■■={h€H\ J>IW;| 2 == \\h\\* < A and 

A{ := |t G C(H, G) The eigenvalues { Uj } of T*T satisfy l/d < ^ < d V j G N j, 



1 For a sequence (bt)ken attaining a minimal value on JV C N, let argmin b n := min{n £ TV | fe n < fe fc Vfc € TV}. 
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where C{H, G) denotes the set of all compact linear operators from H to G having {<fj} and {ipj} as 
eigenfunctions, respectively. The minimal regularity conditions on the solution, the operator and the 
weighted norm ||-|| which we need in this section are summarized in the following assumption. 

Assumption 2.1 Let 7 := (jj)jen, to := (ojj)j^ and A := (Aj)jgN ^ e strictly positive sequences of 
weights with 71 = UJ\ = \\ = 1 such that w/7 and A are non-increasing, respectively. 

Illustration 2.2 As an illustration of the results below, we will consider weight sequences 7^ = j 2p , for 
which J 7 ^ is a Sobolev space of p-times differentiable functions if we consider the trigonometric basis 
in H = L 2 [0,1]. As for the operator, we will distinguish the cases Aj = j~ 2b , referred to as mildly 
ill-posed ([m]) and Xj = ex.p(—j 2b ), the severely ill-posed case ([s]). Concerning the weighted norm, 
we will consider sequences 2 ujj ~ j 2s , such that \\f\\ u = ||/ (s) || L 2 for all / G J*. We will assume that 
6^0 and p ^ s ^ 0, such that Assumption 2.1 is satisfied. 

The following result states lower risk bounds for the estimation of / and thus describes the complexity 
of the problem. 

Theorem 2.3 Suppose that we observe sequences Y and X according to the model (1.1). Consider 
sequences w, 7, and A satisfying Assumption 2.1. For all e,o £ (0, 1), define 



where the infimum is to be taken over all possible estimators f of f. 

It is noteworthy that apart from the unwieldy constant, the lower bound is given by two terms (\e 
and K a ), each of which depending only on one noise level. We show in the proof that Xe is actually, 
up to a constant, a lower risk bound uniformly for any known operator A in the class Ay Hence, in 
this case no supremum over the class Af would be needed. The term K a only arises if the operator is 
unknown in A^. The proof of this lower bound is based on a comparison of different inverse problems 
with different operators in A\, whence the supremum over this class. The term K a quantifies to 
which extent the additional difficulty arising from the preliminary estimation of the eigenvalues cij 
influences the possible estimation accuracy for /: As long as Xe ^ k <j, the same lower bound as in 
the case of known eigenvalues holds. Otherwise, the lower bound increases. Notice further that the 
term pk )£ above corresponds to the MISE of the orthogonal series estimator fk in the case of known 
eigenvalues a,j, and k* is its minimizer with respect to k. Under classical smoothness assumptions, 
the rates and k* take the following forms. 

Illustration 2.4 In the special cases defined in Illustration 2.2 above, the rates from (2.1) are 

[m] Xe ~ £ 2 (P- s )/( 2 P+ 2fe + 1 ), k* ~ £ -l/(2p+26+l) ) Ka ^ a ((p-s)Ab)/b 

[s] Xe ~ I loge|( p - s )/ b , k* ~ I logel 1 /^), K a ~ | logcr|-^ s )/ 6 . 

The following theorem shows that the orthogonal series estimator fk* with optimal parameter k* 
given in (2.1) actually attains the lower risk bound up to a constant and is thus minimax optimal. 

Theorem 2.5 Under the assumptions of Theorem 2.3, the estimator fk* satisfies for all e,a G (0, 1) 



k 




(2.1) 



Ifrj := inf„ eN {x £ 1 min(o;j fc .7 fc , 1 ,X)fli^(A i ) x )} > 0, then 

inf KM, T:,A d x ) > - 1 -min(7 ? ,r)min(r,l/(2d),(l - cT 1 / 2 ) 2 ) 



max(x e , Nx) 



sup sup <)&\\Jk* 




2 b p ~ c p means that limp_>o b p /c p exists in (0, 00). 



4 



To conclude this section, let us summarize the resulting optimal convergence rates under the classical 
smoothness assumptions introduced in Illustration 2.2. In order to characterize the influence of the 
second noise level a, we consider it as a function of the first noise level e. 

Illustration 2.6 Let (ce) e g(o,i) be a noise level in X depending on the noise level e in Y. 
[m] Let p > 1/2, b > 1, and < s < p. If qi := lim £- 2 ((p- s ) v6 )/( 2 p+ 26 + 2 )ct £ exists 3 , then 

sup sup E||jf - /W||i 2 = ((p - s ) A 6)/6, * . 

/e^Ae^ [<^(o-£ ) otherwise. 

[s] Let p > 1/2,6 > and ^ s ^ p. If qi := lim I logel I logovl -1 exists, then 

£->0 

fl%™Ai K 2 \o(|loga e |(P- s )/ & ) otherwise. 

This illustration shows that often the same optimal rates as in the case of known eigenvalues hold 
even when e < a. 



3. Adaptation 

In this section, we construct a fully data-driven estimator of / following the procedure sketched 
in (1.2) and (1.3). The following Lemma will be our key tool when controlling the risk of the adaptive 
estimator. 

Lemma 3.1 Let pen be an arbitrary positive sequence and K G N. Consider the sequence ^> of 
contrasts := m&Xk^j^K — fk\\t — P en j j an ^ & := ar g m i%^j^if {^j + P en j}- Let further 
+ := (t V 0). If (pen-L, . . . , pen^) is non- decreasing, then we have for all 1 ^ k ^ K that 

\\f~ k ~ f\\l ^ 7pen fc +78biasl+42 i max f (\\J 3 - f 3 \\l - \v^ 3 ) + , (3.1) 

where we denote by fj := X/fe=i[/]fc '■Pk the projection of f on the first j basis vectors in H and by 
biasfc := sup^ fc ||/ — fj\\ u the bias due to the projection. 

Proof In view of the definition of k, we have for all 1 ^ k ^ K that 

sC 3{* fe + pen^ +* % + pen, + [|/ fe - /]£} (3.2) 
^6{* fc + pen fc }+3||/ fe -/||2 

Since (pen 1; . . . , pen^) is non-decreasing and 4 bias 2 . ^ max^j^xllfk ~ fjWti we have 

^ 6 !^ (fii ~ - e pen i ) + + 12bias fc • 
It easily verified that for all 1 ^ k ^ K we have 

Wfk - f\\l < g pen fc +2bias| +2 (||^ - fj\\l - - pen^- ) 

3 The limit «oo» meaning strict divergence is authorized. 
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The result follows combining the last estimates with (3.2). □ 

The Lemma being valid for any upper bound K and any monotonic sequence of penalties pen, we 
need to specify our choice. Let us first define some auxiliary quantities required in the construction 
of the random penalty sequence pen and the upper bound K. 

Definition 3.2 For any sequence a := (oj)jeN, define 

ft) Ag := max 1<ia Uj af and S* := feAg ^ ( ffff+ 2)) ; 

(ii) given lo^ := maxi^j^fc ujj, N° := max{l ^ N ^ e^ 1 \ ^ 
and v a := (81og(log(<7 _1 + 20)))~\ let 



:= mm\ 2 < j < N° 



a] 



<e|loge|}-l and M£ := min {2 < j < a' 1 at} < (7 1_,,CT }-1, 



and -fC" CT := iV° A M". // f/ie defining set is empty, set N £ = N° or M" = L " *J > respectively. 

Choosing appropriate sequences a, these quantities allow us define the random penalty term needed 
for the data-driven choice of k as well as its deterministic counterpart which will be used in the control 
of the risk. 

Using this definition and denoting by X the sequence of random variables (Xj)jgjj; define 

K £>a := K* a and pen*. := 600jf e. (3.3) 

Substituting these definitions in (1.2) and (1.3) yields a choice of the dimension parameter k depending 
exclusively on the observations and the noise levels, but not on any underlying smoothness classes. 
Consider the upper risk bound in Lemma 3.1. In order to control the risk of the data-driven estimator, 
we decompose it with respect to an event on which the randomized quantities pen fc and K £ ^ a are close 
to some deterministic counterparts pen^, K~ a , and K £r7 to be defined below in Propositions 3.3 
and 3.5. More precisely, consider the event 

U £ , a := {pen a k < pifi fc < 30 pen£ V 1 ^ k ^ K+ a } D {K~ ^ K £ , a ^ K+ a } 

and the corresponding risk decomposition 

nT% - f\\i = nh - /iisiu... + n\f% - f\\iM„- ( 3 - 4 ) 

As the random sequence pen k is non-decreasing in k by construction, we may apply Lemma 3.1 and 
obtain for every 1 ^ k ^ K ej(T 

\\f % - f\\l < 7pen fc + 78bias| +42 max - - IpSS. 



On the event L5 £)CT , this implies that 



E||/£-/||£lo e , a < 420 min {max(pen£, bias*)} + 42 max . e(||£ - fj\g - \ pen? ) . (3.5) 



The second term in the last inequality is controlled uniformly over J 7 ^ and .4^ by the following 
Proposition. 
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Proposition 3.3 Given A G Af with singular values a := (aj)j^, let V AdX := (y^AdXj)j^ and 

define K^ a := K^ d ^, Mf a := M^ dX , and pen^ := 605% e using Definition 3.2. There is a constant 

d 
x 

[\\fk ~ fk\\i ~ ^Penfc) 1 ^ C {e + rn a + a). 



C > depending only on the class Ai such that 



sup sup IE 



Roughly speaking, the penalty term is an upper bound for the estimator's variation. Typically, it can 
be chosen as a multiple of the estimator's variance. Thus, inequality (3.1) actually features a bias 
variance decomposition of the risk with an additional third term which is controlled by the above 
proposition. 

Illustration 3.4 Note that for any operator A £ Af with sequence °f singular values, the 

sequence 5 a appearing in the definition of the penalty term pen a satisfies (dCd) -1 < < dQ 

for all j E N, with ^ = log(3ci)/log(3). In the special cases defined in Illustration 2.2 above, the 
sequence 5 X takes the following form: 

[m] 6%~k 2b + 2s+1 [s] 5l ~ /t 26 + 2s+1 exp(A: 2b )(logA;)- 1 . 

The next proposition ensures that the randomized upper bound and penalty sequence behave similarly 
to their deterministic counterparts with sufficiently high probability so as not to deteriorate the 
estimation risk. In view of Proposition 3.3, this justifies the choice of the penalty. 

Proposition 3.5 Let K~ a := K^J^^ and M+ := using Definition 3.2 and suppose that 

there is a constant L > depending only on A and d such that 

Then, there is a constant C > depending only on the class Af such that 

sup sup E[||4-/|| 2 J l U cJ < C(l + r)a for all e,a€(0,l). 
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Condition (3.6) is satisfied in particular under the classical smoothness assumptions considered in 
the illustrations. We are finally prepared to state the upper risk bound of the fully data-driven 
estimator /-r of /, which is the main result of this article. 

Theorem 3.6 Under Assumption 2.1 and supposing (3.6), there is a constant C depending only on 
the class A 1 ^ such that for all e, a 6 (0, 1) the adaptive estimator jV satisfies 

Tl u (h,r i ,A{) «S C(l + r){ min {max^^, 5%e)} + k c + e + a}. 

Proof. Considering (3.5), note that for all A £ Af, we have pen^ ^ 60edQ5^ with Q = log(3d)/log(3). 
On the other hand, it is easily seen that for all / £ J 7 ^, one has bias 2 ^ fi^k/lk)- Thus, we can 
write 

sup sup min {max(pen^, bias 2 ,)} ^ C (1 + r) min {max(ujk/^k, $k £ )} 

for some constant C > depending only on d. In view of (3.4), the rest of the proof is obvious using 
Propositions 3.3 and 3.5. □ 

A comparison with the lower bound from Theorem 2.3 shows that this upper bound ensures minimax 
optimality of the adaptive estimator /-£ only if 



Xl.a : = min 
l<k€K: 



l u k rA 

max ( — , o k e 

Ik 



is at most of the same order as max(x e , K a ), whence the following corollary. 
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Corollary 3.7 Under Assumption 2.1 and if sup e)O w ,i) {Xe,o-/ max (/te> K <r)} < oo, we Ziaue 

< v £ , ff G (0,1). 

We conclude this article reconsidering the framework of the preceding Illustration 2.6. Notice that 
the adaptive estimator is minimax optimal over a wide range of cases, even when e < a. 

Illustration 3.8 Let (o" e ) e g(o,i) be a noise level in X depending on the noise level e in Y and suppose 
that the limits qi and q2 from Illustration 2.6 exist in the respective cases. Some straightforward 
computations then show that the adaptive estimator attains the following rates of convergence. 

[m] If p — s > 6, the adaptive estimator /~ attains the optimal rates (cf. Illustration 2.6). In case 
p — s ^ b, we have, supposing that q\ : = lim £~ 2b /( 2 P+ 2b + 1 ) al v " e exists, 



w\\fW fWl|2 JOO^-)/^ 2 ^ 1 )) if gi <ooand<tf<oo, 
sup sup E||/A - f ;|| L2 = ^ ( _ 
/e^XAe^f l<A°e °"e ) otherwise. 



[s] The adaptive estimator attains the optimal rates. 



A. Proofs 

A.l. Minimax theory (Section 2) 
Lower risk bound 

Proof of Theorem 2.3. The proof consists of two steps: (A) First, we show that \e yields a lower risk 
bound in the case where the eigenvalues (a,j) of the operator A are known. (B) Then, we show that 
another lower risk bound is given by K a . 

Step (A). Given £ := 7/min(r, l/(2d)) and a E : = Xe(Ej=i £UJ j/^j)~ l we consider the function / := 

(eC^e) 1 ^ 2 Ylj=i 1 fj- We are going to show that for any 9 := (9j) G { — l,l} fc *, the function 

fg := X/j=i @i\f 1 \j'-Pj belongs to J 7 ^ and is hence a possible candidate for the solution. 
For a fixed 9 and under the hypothesis that the solution is fg, the observation Yk is distributed 
according to N(a,k[fo\ki e ) f° r anv k £N. We denote by Fg the distribution of the resulting sequence 
{Yfc} and by Eg the expectation with respect to this distribution. 

Furthermore, for 1 ^ j ^ k* and each 9, we introduce 9^) by 9^ = 9[ for j ^ I and 9j = —9j- The 

key argument of this proof is the following reduction scheme. If / denotes an estimator of / then we 
conclude 

supE||/-/||^ sup E.HZ-MI^-L £ EellJ-MI 2 , 
/e ^ r ee{-i,i}"S 2 £ 0e{ _ lil}2fe£ * 

1 K 

D E^iiz-Mii 2 (A.i) 

0e{-i,i}*s i =1 

E E y{^|[/ - HI 2 + %.)|[7 - fg(M 2 }- 

9&{-i,i} k l i =1 



1 

2k* 



Below we show furthermore that for all e S (0, 1) we have 

eCote 



{E e |[/-/^| 2 + E, (i) |[/-/ 0W) ],f} > (A.2) 
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Combining the last lower bound and the reduction scheme gives 

k* k* 

supE||/-/L^ w ^ Li^- = t^a- = T' 

/fc ^ ee{-i,i} fc ? J =1 J' =1 

which implies the lower bound given in the theorem by definition of £. 

To complete the proof, it remains to check (A. 2) and fg G J 7 ^ for all G { — l,l} fc£ . The latter is 
easily verified if / € J-J^, which can be seen recalling that uj/j is non-increasing and noticing that the 

definitions of £, a e and r] imply ||/|| 2 ^ C^« E (X^Li xf) ^ CA < r - 

It remains to show (A. 2). Consider the Hellinger affinity p(Pi,P_i) = J a/c^Pi <iP_i, then we obtain 
for any estimator / of / that 

p(Pl,P-l) < / v / d p 1 d p_ 1+ /" J^ftj dP_! 

J |[/6» - / e (i)Jjl J ll/e - J$u)\j\ 

Rewriting the last estimate we obtain 

{E |[/-/4| 2 + E 0W) |[/-/ 0W) ] i | 2 } > ^|[/ e -/ 0W )]i|V(Fi,P-i). (A.3) 

Next, we bound the Hellinger affinity p(Pi,P_i) from below. Consider the Kullback-Leibler diver- 
gence of these two distributions first. The components of the two sequences corresponding to the 
distributions Pi and P_i are pairwise equally distributed except for the j-th component. Thus, we 
have log(dPo/dP e (j)) = (2yj(ij0j[f]j/e), and taking the integral over yj with respect to Fg, we find 

2 2d 
tfL(Pi,P_i) = - a 2 [f} 2 < — [ftfXj = 2dQa £ < 1, 

Using the well-known relationship p(Pi,P_i) ^ 1 — (l/2).KX(Pi,P_i) between the Kullback-Leibler 
divergence and the Hellinger affinity, we obtain that p(Pi,P_i) 1/2. Using this estimate, (A.3) 

becomes |]Ee|[/ — fg]j\ 2 + ~~ /0O')]ilj ^ |[/]|' anc ^ combining this with (A.l) implies the 

result by construction of the solution /. 

Step (B). First, we construct two solutions fg G J-^ and operators Ag G Af (with 9 6 { — 1,1}) such 
that the resulting images gg satisfy g_i = g%. To this end, we define k* := argmaXj-g^ja;^" 1 min(l, aXj 1 )} 
and a a : = Cmin(l, a 1 / 2 ^} 12 ) with ( := min(2- 1 , (1 - eT 1 / 2 )). Observe that 1 > (1 - a a ) 2 ^ 

(1 - (1 - 1/d 1 / 2 )) 2 ^ l/d and 1 < (1 + a a ) 2 < (1 + (1 - 1/ci 1 / 2 )) 2 = (2 - 1/d 1 / 2 ) 2 < d, which implies 
1/d ^ (1 + Qo~(j ) 2 ^ d These inequalities will be used below without further reference. We show 

— 1/2 

below that for each 6 the function fg := (1 — #a:<r)S7fc* '■PK belongs to J 7 ^ and that the operator Ag 
with the singular values at = [1 + flao-l-fA; = fc*}] \f\~k~ is an element of A.i. We obviously have that 
Axfj = (1 - o£){\ k .Jn i ) 1/2 (r/d)f ki = A_i/-i. 

For G {=tl}) denote by P# the joint distribution of the two sequences (Xi, X2, . . .) and (Yi, Y-jj . . .), 
and let Eg denote the expectation with respect to Fg. 

Applying a reduction scheme as under Step (A) above, we deduce that for each estimator f of f 
sup supE||/-/|| 2 > max E,||/-/ e || 2 ^ ijExll/-/!!! 2 +E.!!!/-/.!!! 2 ). 
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Below we show furthermore that 

m 1 \\ J- h\\l + Euii/ - > in/! - u\\l (A.4) 

Moreover, we have — /— 1||^ = 4a^(r/d)wfcj7^* 1 = 4£ 2 (r/d)u;/%*7 ; ~* 1 mini 1, -y^ ) . Combining the 

last lower bound with the reduction scheme and the definition of k* implies the result of the theorem. 
To conclude the proof, it remains to check (A.4), fg G J 7 ^ and Ag G A\ for both 9. In order to show 

fg G observe that ||/ e || 2 = lK \[fg] K \ 2 < TfejK 1 " (^/^TaT/ 72 1 2 < r. 

To check that Aq G A.^, it remains to show that 1/d ^ (a^) 2 /Xj ^ d for all j ^ 1. These inequalities 
are obviously satisfied for all j ^ k*, and as well for j = k* by construction of the operator A. Finally 
consider (A.4). As in Step (A) above by employing the Hellinger affinity p(¥%, P_i) we obtain for any 
estimator / of / that 

Ex||7- fr\\l + E_!||7- ^ - /-all^^Px,?-!). 

Next, we bound the Hellinger affinity p(Pi,P_i) from below for all a G (0, 1), which proves (A.4). 
Notice that by construction of fg and Aq, the distribution of Xj and Yt does not depend on 9, except 
for X|» . It is thus easily seen that the Kullback-Leibler divergence can be controlled as follows, 

(a£. - a^ 1 ) 2 2a 2 

Using p(Pi,P_i) > 1 - (l/2)ifL(Pi,P_i) again, (A.4) is shown and so is the theorem. □ 



Upper risk bound 

The following proof uses Lemma A.l from the auxiliary results section A. 3 below. 

Proof of Theorem 2.5. Define / := J^wJ/Jjl-f-X"^ °~} e j an d decompose the risk into two terms, 

ml- ni = m\T- it + ml- m =-.a+b, 

which we bound separately. Consider first A which we decompose further, 

hi 



(A.5) 



nf-f\t = Y,^ 



3 



+J2"i\[fh\ 2 ® 



Ai + A 2 . 



As far as Ai is considered, we use Lemma A.l (iii) from Section A. 3 below and write 



; e[x 



3 E 



E[X; 



*3 



11 HX^a} 



3=1 



As for A2, we apply Lemma A.l (i) and obtain 



a-; 



A 2 < 8d V^l^jfmin ( 1, J < 8ck CT 
j=1 V 
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Consider now B which we decompose further into 

nl- f\\l = I>|[/],f*H(i - i{i < j < K}1{X] > a}f] 

k* 

= E^i[^i 2+ E^i^i 2p ( x l< CT ) = : ^i+^, 

j>k* j=l 

where B\ ^ H/H^&jTfc* *s r Xe because / E J 7 ^. Moreover, ^ 4cirK (J using Lemma A.l (ii). The 
result of the theorem follows now by combination of the decomposition (A. 5) and the estimates of 
A 1 ,A 2 ,B 1 and B 2 . □ 



A. 2. Adaptive estimation (Section 3) 

The proofs in this section use the Lemmas A. 3- A. 6 from the auxiliary results section A. 3 below. 
Proof of Proposition 3.3. Using the model equation Yj = [g]j + we have for all t € Sk that 

1 



[fk ~ fk]j = — — — + 



X^ 



Thus, we may decompose the norm ||/^ — fk\\t in three terms according to 

k 



k k / \ 2 

ii a -Ml* 3E "z^l + 3 I> WMxp°\ - ^ 



j=l j=l 

,:3{T«+Tf +rf }. 



[5] 



Define the event 



V a := { V < j sC M+ 



Xj CLj 



< and Xf ^a>. 

2 cu J 



Since 1{X 2 ^ cr}l{ft CT } = l{ft CT }, it follows that for all 1 < j < X+ CT we have 



^{X?^}-! l{J2 CT } = a|l{0 CT } 



X; 



Hence, T fc (2) ln CT < |T fe (1) for all 1 < h < and thus 



1 1 



max, (||A-/ fe ||S-^pe4 E^i" 2 ^ 



fe=i \ j=i 



+ 3 max T^lnc+3 max Tjf' . 



,(3) 



Keeping in mind that P[fl£] ^ C(d)a 2 by virtue of Lemma A. 6, the result follows immediately using 
Lemmas A. 3, A. 4, and A. 5 below. □ 



Proof of Proposition 3.5. Let ff. := ^ 1<J<fc 1{X? o~}ej. It is easy to see that \\ fk — fk\\ 2 ^ 
II fk' ~ fk'W 2 for all k' < fc and - /|| 2 < ||/|| 2 for all fc > 1. Thus, using that 1 < jfc < (X° A cr" 1 ), 
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we can write 



< 2|lE||/ (Ar o ALeT - aj) - f iN o A[a -i }) \\ll{Ul a } + \\f\\l j. 
Moreover, using the Cauchy-Schwarz inequality, we conclude 

2 ^-{e^ - «,;/;, ) 2 i in',,} + E( ai [/],- - .v,;/;,) 2 ! {o'.„.}} 



i^KC^a^" 1 ]) 

i<K(iV e °ALo-iJ) 
+ 



c^E^-bb) 4 ] p[^,J 1/2 



1/2, 



^ 2\/3c7~ 1 {(cr~ 1 



max w,' )e + 



^ll/ll^}P[^] 1/2 , 



which implies 



nh - f\\iH%A < c| (a- 2 + ii/n 2 ) p^] 1/a + 11/11' p[^,J 



Lemma A. 6 below yields, for some C > depending only on the class A^, 



m\f% - f\\lH%*} ^C\a+ ||/|| 2 a 6 + \\fWia 



|2 _12 



which completes the proof due to / G Tl 



□ 



A. 3. Auxiliary results 
Lemma A.l For every j G N, 

2 



(i) R] := E 



x, 



1 1{X 2 > a} 



< min 



Ml 



(»*; Rf := P[X? < <r] < min {l, §} 



Proof, (i) It is easy to see that 



< 4 



i?j=E 



l^--«i 12 



l{Xj > a} 



< cr" 1 Var(A^) = 1. 



On the other hand, using that E[(Xj - a.,-) 4 ] = 3ct 2 , we obtain 



flj- < E 



<{X 3 -a 3 f X) 



(Xj djj 2 

-2 l{X j > cr} 2 



+ 



a- 



2E[(X,- - aj) 4 ] 2 Yar(Xj) _ 8cr 



(A.6) 
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Combining with (A. 6) gives Rj ^ min|l, |, which completes the proof of (i). 

(ii) Trivially, R,! 1 < 1. If 1 ^ 4a /a? then obviously R 1 / sC min < 1, >. Otherwise, we have a < a 2 / 4 

J J J j 

and hence, using Tchebychev's inequality, 



P[|Xj - Oj| > |aj| /2] sC ^ mm 



where we have used that "Var(Xj) = a for all j. 



(iii) E 



(¥) 



^ 2E 



^ 4. 



□ 



Lemma A. 2 Under Assumption 2.1, we have that 

(i) s5 N + ^ 32 d 2 for all e £ (0, 1), 
and there is a do £ (0, 1) such that for all a < do, we /ia?;e 
fnj min 1<j<M + a) ^ 3a. 

Proof (i) For iV+ = 0, we have 5 N + = and there is nothing to show. If < ^ n, one can show 
that u~^ + /X N + ^ 4d/{eN^~ \ loge|), which we use in the following computation: 



N+ <4} MK £+ /A J v e+ )V(jV £ + + 2)) ^ M 



lo£ 



g ( — V(iV+ + 2) 



A 



2V+ 



log(iV+ + 2) 



e log el 



log(iV+ + 2) 



-l f 4(i 



(log^" 1 + 2) > 4d) 



|4d(4d + log(4d)) /(log^ -1 + 2)) (otherwise), 



which implies eS N + < 4d(4d + log(4rf)) «S 32<i 2 for all e G (0, 1). 
(ii) We have that 

A,- ct 1 ^ 



mm ^ mm — ^ 



^ 3a, 



where the last step holds for sufficiently small a as some algebra shows. 



□ 



Lemma A. 3 We have that 



fe=i v i=i J 

Proof Representing the expectation of the positive random variable by the integral over its tail 
probabilities and using 5f, X]j=i( w j/ a |)' we mav write 



fc=l v j=l J 



+ k=l 



j=l J 



dx 



E 

k=l 



dx 
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Define pk := {eujk)/a 2 k , := 4eAg, and B k := 2e 2 Ylj=i UJ j/ a j- ^ can ^ e snown ( see proof of 
Proposition A.l in Dahlhaus and Polonik (2006)) that for all 1 ^ k' ^ k and m ^ 2, we have 



E 



(^(£-i>)' 



Hence, the assumption of Theorem 2.8 from Petrov (1995) is satisfied and splitting up the integral, 
we get the following bound: 



A'? 



E E E^I- 2 ^ 



fc=i ^ j=i 



+ 



B k /H k -e5? 



The second integral is equal to AH k exp(— Bk/^AH?)). Some computation shows that the first one is 
bounded from above by AH k [exp ( - e 2 (5 k ) 2 /(4B k )) - exp ( - B k /(4H%))]. Thus, the two identical 
terms cancel, and we get 



E E E^I- 2 ^ ^ 16e E A ^ ex p 



fc=i v i=i 



A'? 



k=l 



W) 2 

8fc(A«) 2 



To complete the proof, we bound the sum on the right hand side as follows, 



k=i 



E A t ex p ( - J < E ex p ( - lo ^ A t v ( fc + 2 )) 



a\2 



k 

81og(A; + 2) 



^ e E exp ( " 

k=i 

^ e / exp ( - — — — 
J V 81og(3) 



S\og{k + 2)) ^ e E ex P(~ 81^(3) 



dx = 128 log 2 (3) e, 



where we have used \og(k + 2) ^ log(3)\/fc for all k ^ 1. 
Lemma A. 4 For every G N and a G (0, 1), 



□ 



E 



^ 8 d r ^(7, A, w). 



Proof. Firstly, as / G J 7 !!, it is easily seen that 



E 



where Rj is defined as 



Rj 



fl{X 2 > a 2 } - 1 



(A.7) 
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In view of the definition of K a in Theorem 2.3, the result follows from E[|.Rj| 2 ] ?C d min|l,^|, 
which is a consequence of the decomposition 



E|R 



I 2 = E 



1 ) 1{X} > a} 



+ P[X 2 < a] 



and Lemma A.l. 



(A.8) 
□ 



Lemma A. 5 We have that 

■ K+ 



E 



E 

i=i 



^3 



^ 64d 3 (P[^]) 1 / 2 . 



Proof. Given Rj from (A. 7), we begin our proof observing that 



E 



E 

L 3=1 



where we have used the independence of X and Y and Wax(Yj) = e. Since d5£ ^ Y2j=i f° r an 

d 
A' 



A G »4f the Cauchy-Schwarz inequality yields 



E 



^ /I 1\ 2 



d(Pm) 1/2 eS^. max^E^,! 4 ]) 1 / 2 . 



0<j^N? 



□ 



Proceeding analogously to (A. 6) and (A.8), one can show that E[|i?j| 4 ] ^ 4. The result follows then 
using the definition of iV~+. 

Lemma A. 6 For k G N 7 define the events 

f2fc : = 



- 1 



<C V 1 < j < A; 
3 



and suppose that Assumption 2.1 holds. For all e, a G (0, 1) , we /ioue 
ft) C {pen 4 : < pen fc < 30pen+ V 1 < k < JsT+J, 

(in) P[h c ,} < C(d)a 2 and P[tt c a ] < C(d)a 2 . 
If additionally condition (3.6) holds, then 

(w) p[oy<c(A,d) ff 6 . 

Proof. Consider (i). Notice first that <5£ ^ ^fc ^Cd for all ^ 1 with ^ := (log(3d))/(log3). Observe 
that on n a we have (1/2) Ag < A x < (3/2)Ag for all 1 < fc < M a and hence (1/2) [Ag V (A; + 2)] < 
[A x V (jfe + 2)] < (3/2) [A^ V (Jfe + 2)], which implies 

log 2 log(/c + 2) 



y 1 1 H log(A; + 2) 



log(/c + 2) log(A° V [A; + 2]) 



< s x < f3/2v^r log(A ^ v[fc+2]) > i d + log3/2 ip g( fc + 2 ) 
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Using log(Ag V (k + 2))/log(fc + 2) ^ 1, we conclude from the last estimate that 



<52/10<(log3/2)/(2Iog3)^< (l/2)^[l-(lo g 2)/log(fc + 2)]<<5 ; 



x 



< (3/2)^[l + (log3/2)/log(fc + 2)] < 35 a k . 
It follows that on Q a we have pen^" ^ pen^ ^ 30 pen^" for all 1 ^ k ^ as desired. 



Proof of (ii). Denoting by X the random sequence (Xj)j^±, define sequences N £ := iV £ 
M~ := M^ X/{M) and N £ := N* , M a := Mf . Note that by definition, K~ a = N~ A M~ and 
K £tU = N £ A M CT . Define further the events 0/ := > ^ e , CT } and Q H := {K £jCT > K+ a }. Then 

we have {K^ < K £j<7 < if+ CT } c = fi/ U fij/. Consider 0/ = {N £ < K £ a ) U {M CT < first. 

By definition of N~, we have that mm 1<i<N - ^ 4e| loge|, which implies, keeping in mind that 



^2 



< e| log gr| 



c U 



x, 4\^ U 



Oj 2 



One can see that from min 1<j . <M - aj ^ 4a 1 VcT it follows in the same way that 

{fc<*r,}c U {Iff^Nf 

Therefore, fi, C U 1sSK m+ { 1^/^ " l \ > 1 / 2 } ^ ^m++V since M ° < M - " 

Consider 0// = {iV e > n {M CT > if+J. In case K+ a = N+, note that by definition of N+, we 



have e\ loge|/4 ^ 



AV +1 

JV+ + 1 



such that 



ii C {iV £ > JV+} c {Vl < j < JV+ + 1 



^e|loge|j 



C 



X 7V+ + 1 



>2> C 



iV £ + +l 



X 7V+ + 1 



l iV++l 



In case K+ a = M+, it follows analogously from a 1 VcT ^ 4max j>M + +1 aj that 

fijj C {M CT > M+} C { 

\ X M++l/ a M+ + l ~ X l ^ 1 }" 
Therefore, we have 0// C <^ l-XW +1 /a K + , i — 1| ^ 1 > C . and (ii) is shown. 

Proof of (iii). For Z ~ M{0, 1) and z ^ 0, one has P[Z > z) < (27TZ 2 )" 1 / 2 exp(-z 2 /2). Hence, there 
is a constant C(d) depending on d such that for every 1 ^ j ^ M+, 



1/2 



P[|X,/a, - 1| > 1/3] < C(d) 



a 



exp 



A 



18ad 
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Consequently, as M+ < a 1 and A M + > a 1 Va /{Ad), we have 

P[h c M+ ] ^ C(d)a 2 ~^ exp 



72d?) 



which implies P[f^ + ] C(d) a 2 using that a v " | logcr| — > as a — > 0. As for the second assertion 

in (iii), we distinguish the cases a ^ <7o an d cr > <7o, where o~q is the constant from Lemma A. 2 (ii) 
depending only on d. The assertion is trivial for a > o~q (keeping in mind that P[0£] $J o~q 2 g 2 ). 
Consider the case a ^ do, where a 2 ^ 3a for all 1 ^ j ^ M£ due to Lemma A. 2 (ii). This yields for 
the complement of f2 CT 



3 1 < j M+ 



X,; 



>- or X<<o\ C <j31^i^M+ 











— i-l 











It follows with assertion (ii) that CT C C: + for all a ^ o"o, implying the second assertion of (iii). 
Proof of (iv). Following the proof of (iii) and using that + 1 ^ we obtain 



18ad 



(A.9) 



Note that C f2 CT , since trivially £l M + +l 

of condition (3.6). 



C £l M +. Thus, (A.9) implies assertion (iv) by virtue 

□ 
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